library('tidyverse')
## Warning: package 'tidyverse' was built under R version 3.5.2
## ── Attaching packages ───────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## ── Conflicts ──────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library('RColorBrewer')
library('LIT')

Intro – Information vs Data

What is the difference between data and information? This question might easily be mistaken for a very lame riddle, as these terms are often used interchangeably in science, but entire subfields of data science are dedicated to answering this puzzle. To avoid invoking an undue amount of mathematical tedium in defining these terms, we’ll instead attempt to distinguish these concepts through a simple example.

Suppose we want to explore the gait dynamics of a dairy cow using a leg-mounted accelerometer that logs a measurement of total acceleration once every 0.1 seconds. The grad student implementing the protocol, however, has been hurt before by faulty sensors, and decides to strap two sensors to the same leg. Now suppose we plot this data and it looks something like Figure 1A. Can we learn anything about this cow from the second sensor that we could not infer from data provided by the first sensor. Excluding a trivial amount of measurement error, the two signals are virtually identical, so no – even though we have doubled the size of our dataset by adding a second sensor, we have not collected any additional information about the gait dynamics of this cow.

As this example has hopefully highlighted, data can be considered as a physical resource – something that can be measured in bytes and memory space. Information, on the other hand, should be regarded as a more nebulous unit of measure that represents how much can be learned from a data set. In experimental biology, collecting measurements is usually quite costly, so sampling strategies are employed to avoid any redundancy between datapoints, such that data and information are often equivalent terms. With sensor technologies, however, sampling strategies are often dictated by the hardware, which can result in a considerable amount of redundancy between datapoints. With such datasets, it is often possible to employ a information compression strategy to reduce the size of the data set without losing any information about the system. Returning to our previous example, suppose we average the output of our two temporally synchronized sensors, as illustrated in Figure 1B. We would succeed in reducing the size of our data set by half, and no details about the signal collected would be scarified.

Now let’s suppose that the grad student implementing this protocol, their ego bolstered by these pretty graphs, returns to the barn for another round of data collection. The cow takes offense to their overconfidence, and in the ensuing battle to re-attach the two leg sensors, one sensor is placed upside down on the leg. As a result, the signals become inverted, as illustrated in Figure 1C. Thus, when the grad student returns to the lab and runs this new data back through the mean filter validated in their pilot study, the signals cancel out and is lost, as shown in Figure 1D. While this example is ultimately fairly trivial – the grad student could easily correct their code and recover this signal once the source of the error is discovered – it should still serve to illustrate how easily important information can be lost when the assumptions employed in an information compression strategy do not match the data to which it is applied.

walkdat <- read.csv('Walk_120/Walk_120/LF/Run2.csv')

mystart <- 10000-50
mylag <- 120 * 2.2
walksub <- data.frame(Time = 0:mylag, Sensor1 = walkdat$Gyr_X[mystart:(mystart+mylag)])
rm(walkdat)

# Figure 1A

set.seed(61916)
walksub$Sensor2 <- walksub$Sensor1 + runif(nrow(walksub), -15, 15)
walksub2 <- tidyr::gather(walksub, Sensor, Acceleration, Sensor1, Sensor2)

fig1a <- ggplot(walksub2, aes(x = Time, y = Acceleration, col = Sensor)) +
                geom_line(alpha = 0.75) + 
                scale_color_manual(values=c('Blue','Red')) +
                theme(legend.position="none") +
                ggtitle('Trial 1: Raw Sensor Output')


# Figure 1B

walksub$Acceleration <- (walksub$Sensor1 + walksub$Sensor2)/2

fig1b <- ggplot(walksub2, aes(x = Time, y = Acceleration)) + 
                geom_line(alpha = 0.75, color = 'Purple') +
                ggtitle('Trial 1: Averaged Sensor Output')


# Figure 1C

walksubn <- walksub[,c('Time','Sensor1','Sensor2')]
walksubn$Sensor2 <- -1 * walksubn$Sensor2
walksub2n <- tidyr::gather(walksubn, Sensor, Acceleration, Sensor1, Sensor2)

fig1c <- ggplot(walksub2n, aes(x = Time, y = Acceleration, col = Sensor))+
                geom_line(alpha = 0.65) +
                scale_color_manual(values=c('Blue','Red')) +
                theme(legend.position="none")+
                ggtitle('Trial 2: Raw Sensor Output')


# Figure 1D

walksubn$Acceleration <- (walksubn$Sensor1 + walksubn$Sensor2)/2

fig1d <- ggplot(walksubn, aes(x = Time, y = Acceleration)) + 
                geom_line(alpha = 0.65, color = 'Purple') +
                ggtitle('Trial 2: Averaged Sensor Output')


# Combine Figure 1

fig1 <- ggpubr::ggarrange(
           plotlist = list(fig1a, fig1c, fig1b, fig1d),
           nrow = 2, 
           ncol = 2)
plot(fig1)

ggsave(file="Viz/Fig1.jpeg", 
         fig1, width = 15, height = 12, dpi = 300)

A Simulation-Based Case Study: Model-Dependent Vs Model-Free Information Compresion

In practice, animal scientists are seldom forced to work with raw senor data, as in the previous example. And there are of course more sophisticated models than a simple mean. To further illustrate the fundamental differences between model-dependent and model-free approaches, lets consider a more practical example. Suppose a farmer has a group of 100 cows that are overstocked at a 2:1 ratio in a freestall barn, and is concerned about the welfare of his animals. Each cow is fitted with a leg-mounted accelerometer for estrus detection, which also returns a daily measurement for the proportion of the day that each animal spends lying down. The farmer provides a 60-day dump of this data to their welfare consultant, and asks if they should be concerned about any of their cows?

Using monte carlo simulations we have generated a data set that might be produced from such a scenario. The details for this code are provided in Supplemental Materials, but we will here only consider the question how to approach analysis of this data set when ignorant of its underlying structure.

nh <- 50
nc <- 50
ndays <- 60
sday <- 30

simdatday <- array(NA,
                   dim = c(nh+nc,ndays),
                   dimnames = list(
                     c(paste('Heifer', 1:nh, sep = ''),
                       paste('Cow', 1:nc, sep = '')),
                     paste('Day',1:ndays)
                   )) 

set.seed(61916)
for(i in 1:(nh+nc)){
  
  if(i <= nh){ # sim heifer
    
    # simdatday[i, 1:sday] <- round(runif(length(1:sday), 
    #                                     0.2*1440, 0.4*1440))
    # simdatday[i, (sday+1):ndays] <- round(runif(length((sday+1):ndays), 
    #                                       0.4*1440, 0.6*1440))
    
    temp <- rnorm(1, 0.3, 0.02)
    simdatday[i, 1:sday] <- rnorm(length(1:sday), 
                                         temp, 0.05)
    temp <- rnorm(1, 0.5, 0.02)
    simdatday[i, (sday+1):ndays] <- rnorm(length((sday+1):ndays),
                                          temp, 0.05)
    
  }else{ # sim cow
    
    # simdatday[i, (sday+1):ndays] <- round(runif(length((sday+1):ndays), 
    #                                     0.2*1440, 0.4*1440))
    # simdatday[i, 1:sday] <- round(runif(length(1:sday), 
    #                                       0.4*1440, 0.6*1440))
    
    temp <- rnorm(1, 0.5, 0.02)
    simdatday[i, 1:sday] <- rnorm(length(1:sday), 
                                         temp, 0.05)
    temp <- rnorm(1, 0.3, 0.02)
    simdatday[i, (sday+1):ndays] <- rnorm(length((sday+1):ndays),
                                          temp, 0.05)
    
  }
  
  
}

Suppose that the consultant is an animal scientist. Given their training in experimental design, we might expect that they would seek to answer this question with full statistical rigor using a linear model. Using the standard exploratory data analysis techniques, they produce the plot in Figure 2, wherein lying times for each cows are plotted against the day of observation, with a loess curve fitted to reflect the mean lying time for the entire herd at each time point. From this visualization they glean that there is a considerable amount of variation between cows, and that there certainly are animals on any given day that are not spending a sufficient amount of time lying down. There is not, however, any clear evidence in this graph that would indicate that lying time patterns within this herd are changing over time - quite the opposite actually, the herd mean is incredibly stable over the observation interval.

simdatday_l <- as.data.frame(simdatday)
names(simdatday_l) <- 1:ndays
simdatday_l$CowID <- rownames(simdatday_l)
simdatday_l$Parity <- as.factor(c(rep('Heifer', nh), rep('Cow', nc)))

simdatday_l <- tidyr::pivot_longer(simdatday_l,
                    cols = as.character(1:ndays),
                    names_to = 'Day',
                    values_to = 'LyingTime'
                    )
simdatday_l$Day <- as.numeric(simdatday_l$Day)
#simdatday_l$LyingTime <- simdatday_l$LyingTime/1440
simdatday_l$CowID <- as.factor(simdatday_l$CowID)


fig2 <- ggplot2::ggplot(simdatday_l, aes(x = Day, y = LyingTime)) + 
  geom_smooth(method = 'loess', formula = y ~ x, colour = 'red', se=F) +
  geom_point(size = 0.5) + 
  ylab('Proportion of Time Spent Lying') + 
  ylim(0.1, 0.7) + ylab('Proportion of Time Spent Lying') +
  ggtitle('Exploratory Data Visualization: Lying Time vs Day')


ggpubr::ggexport(fig2,
                         width = 3000, 
                         height = 2000, 
                         res = 300,
                         filename = 'Viz/Figure2.jpeg')
## file saved to Viz/Figure2.jpeg
fig2

Based on this visualization, the consultant constructs a mixed effect model with cow as the random effect, to account for repeated measurements and avoid pseudo replication. Since the data appears stationary, they add day of observation as a simple continuous fixed effect. Finally, based on their knowledge of factors that may affect lying time, they add a discrete fixed effect to distinguish between heifer and cows within the herd.

library('lme4')
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
mixout_day <- lmer(LyingTime ~ Parity + Day + (1 | CowID), 
                   data = simdatday_l, REML = T)
summary(mixout_day)
## Linear mixed model fit by REML ['lmerMod']
## Formula: LyingTime ~ Parity + Day + (1 | CowID)
##    Data: simdatday_l
## 
## REML criterion at convergence: -8945.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4426 -0.8755 -0.0012  0.8719  2.4255 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  CowID    (Intercept) 0.0000267 0.005167
##  Residual             0.0130815 0.114374
## Number of obs: 6000, groups:  CowID, 100
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)   4.015e-01  3.414e-03 117.585
## ParityHeifer -2.584e-03  3.129e-03  -0.826
## Day          -1.204e-04  8.526e-05  -1.412
## 
## Correlation of Fixed Effects:
##             (Intr) PrtyHf
## ParityHeifr -0.458       
## Day         -0.762  0.000

Surveying these results, they determine from the fixed effects that the average lying time for this herd is about 40% of a given animal’s day, and that the average lying time of heifers is neither statistically or biologically different from that of cows. They also confirm that there is no significant (linear) temporal trend in daily lying time patterns. In assessing the variance components estimated, variability in lying time is substantially higher within- than between-cow, which suggests that there are no consistent individual differences in the lying patterns between cows. From this result they might confirm that while cows might not get to lie down an adequate amount of time every day, there cannot be a large contingent of chronically under-rested cows in this herd.

Though this is a perfectly reasonable interpretation of this model, lets suppose that the farmer is dubious of these results, so they send the data to a new consultant. This time, however, they think to mention that that they are pretty sure the start of the grazing season fell somewhere in this observation period, at which point cows had free access to pasture from their free stall barn. This consultant, knowing that cows can react differently to pasture access, develops a similar model to the first, but this time include interaction effects in order allow the temporal dynamics to differ between the heifers.

library('lme4')
mixout_day2 <- lmer(LyingTime ~ Parity * Day + (1 | CowID), 
                   data = simdatday_l, REML = T)
summary(mixout_day2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: LyingTime ~ Parity * Day + (1 | CowID)
##    Data: simdatday_l
## 
## REML criterion at convergence: -14295.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5760 -0.6872 -0.0059  0.6941  3.3337 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  CowID    (Intercept) 0.0001569 0.01253 
##  Residual             0.0052682 0.07258 
## Number of obs: 6000, groups:  CowID, 100
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)       5.558e-01  3.216e-03  172.84
## ParityHeifer     -3.113e-01  4.548e-03  -68.45
## Day              -5.182e-03  7.652e-05  -67.72
## ParityHeifer:Day  1.012e-02  1.082e-04   93.54
## 
## Correlation of Fixed Effects:
##             (Intr) PrtyHf Day   
## ParityHeifr -0.707              
## Day         -0.726  0.513       
## PartyHfr:Dy  0.513 -0.726 -0.707
fig3 <- ggplot2::ggplot(simdatday_l, aes(x = Day, 
                                 y = LyingTime, 
                                 color = Parity)) + 
  geom_smooth(method = 'lm', formula = y ~ x, se=F) +
  geom_point(size = 0.5) + 
  ylim(0.1, 0.7) + 
  ylab('Proportion of Time Spent Lying') +
  scale_color_manual(values=c("lightslateblue", "olivedrab3")) +
  theme(legend.position = 'bottom') +
  ggtitle('Visualization of Interaction Between Day and Parity')

ggpubr::ggexport(fig3,
                         width = 3000, 
                         height = 2000, 
                         res = 300,
                         filename = 'Viz/Figure3.jpeg')
## file saved to Viz/Figure3.jpeg
fig3

With this additional information, the resulting model tells an entirely different story. With the addition of the interaction effect, age of the animals and the day of observation now have a biologically and significantly significant impact on expected lying time. The consultant can now see that, during the first half of the observation interval, the mature cows hogging the free stall spaces, forcing the heifers to stand; however, once the grazing season starts, the cows leave their free stalls to go graze, while the now foot weary heifers remain in the free stalls.

Given how much goes on under the hood in modern statistical software, its easy to think of linear models as being synonymous with statistical inferences, but that isn’t true. A linear model, at the end of the day, is just a means of producing mean and variance estimates that are conditional on additional variables. In other words, linear models are just a means of generating summary statistics - summary statistics that are more nuanced and targeted than your standard quantile summaries, but still just summary statistics. Its only after a number of additional probabilistic assumptions are made that any statistical inferences can be made about value of these model parameters relative to some null. All summary statistics, however, can and should be thought of as a form of information compression. And so, just as in our previous thought experiment, we see from this example that when we can come up with a good model for our system, a linear model can be effective in extracting and consolidating the information in our data set. When, on the other hand, we are missing or overlook key assumptions about underlying system that gives rise to the behavior observed, a linear model becomes an inefficient means of information compression that can mask important patterns in a data set.

Obviously, this example is fairly trivial, and the underlying pattern might easily have been picked up with additional EDA visualizations or a more exhaustive approach to model development. But what if the underlying temporal pattern had been more complex - what if, instead of a persistent switch in lying patterns between heifers and cow after the introduction of pasture access, animals had grazed throughout the observation window and the cows only crowded out heifers at the free stalls on days where it was too rainy to graze. The negative correlation imposed between the two groups due to resource/space restrictions, and the subsequent flip-flopping in lying patterns, would have been randomly dispersed over the observation period, and might easily have be easily overlook in simple scatter plots by date. Or what if this pattern were mediated by an unmeasured trait. For example, if this had been a pen exclusively of heifers that had been patched together from multiple rearing groups, wherein larger heifers out competed the smaller animals just as a mature cow might. There are of course ways to address each of these issues within a linear modeling framework. Where such complexities begin to compound within a system, however, it can quickly become overwhelming to account for all such contingencies within a model. Subsequently, the farther we moved from controlled experimental contexts, where all variables can be elucidated, towards the chaos and complexity of commercial farm environments, the more fundamentally challenging it becomes to use model-based approaches to information compression. Consider also, then, that by definnition an animal’s behavior is designed to serve as a beffer between their environment and their more stable internal environments. Behavioral responses, which are targets of measurments of meany livestock sensor systems, are then perhaps the least amendable to such explicit modeling.

Linear models are, ultimately, one of many classes of information compression algorithm. A number of distinct analytical approaches that rely more heavily on the power of modern computing can also be found in the still relatively young field of machine learning (ML). In discussions of information compression, however, it can be useful to distinguish such algorithms into two broader types. Supervised ML is the larger and more developed of these two subfields, as it emphasizes optimization of the predictive power of large data sets. Such algorithms employ a range of empirical methods to elucidate the structure of more nuanced predictive models, but just as with linear models, the user still must distinguish between explanatory and response variables. Subsequently, just as in linear models, such algorithms may still overlook important patterns in a data set if they are attributed to latent (or simply unmeasured) causes.

Unsupervised ML algorithms, on the other hand, employ a more open-ended approach to information compression. Such algorithms do not distinguish between explanatory and response variables, but seek only to extract and visualize the most striking nonrandom features of a data set until only noise remains. A number of algorithmic approaches can be employed to explore the latent high dimensional geometry of large data sets, but generally these algorithms fall into one of three classes. Neural networks are quite arguably the most powerful of such techniques, but they currently suffer two main drawbacks: namely, the sample sizes required to train network, and difficulties in interpreting the resulting neaural architechtures (ie - black box syndrome). Advances in transfer learning and our understanding of neural architectures may one day render these issues a moot point. At present, however, such algorithms are only well suited to analyses that can bring to bear observational units in the many hundreds, if not many thousands, which limits their practical utility in many behavioral applications.

Two additional classes of UML algorithms are suitable to samples of the scale more commonly found in modern production groups: spectral embedding and hierarchical clustering. In previous research we have discussed at greater length head to head comparisons of the performance of representative algorithms from both these classes in analyses of milking order data generated from RFID milking logs (McVey et al 2020). In this work we showed that spectral embedding algorithms are susceptible to producing artifacts when the their is not adequate diversity in the sampled data - an issues that likely is not uncommon when considering behaviors are often quite limited by spatial constraints and management schedules. Such algorithms, therefore, are perhaps more suited to problems where the behaviors express can be artificially manipulated to achive more even sampling than to observational studies on farm. Subsequently we found that heirachical clustering algorithms, which are the most computational of these three UML subtypes, offered the greatest flexibility with the lowest sampling overhead.

Hierarchical clustering (HC), while slightly imposing in its name, is in its broadest strokes a fairly intuitive algorithm. Suppose you go to feedstore and get a pack of those little plastic toy cows, stand them up spread out in a line in front of a toddler, and ask them to pair up the two most similar cows. They point to the Guernse and the Holstein, so you pull them forward and stand them together. You ask this question again, they point to the Jersey and the Brown Swiss, and you pull these forward and stand them together. You ask again, and the Angus and Herford cows are paired together. You ask again, and the two groups of dairy cows are pushed together. And finally, if you still command said toddlers attention by this point, you end up through one final pairing with all your cows in one pile. If you kept track of all these pairings, you could then visualize this decision making process via a dendrogram, which would in turn provide a succinct 2D representation of how a range of phenotypic features are distributed within your wee plastic herd.

breedmat <- matrix(0, nrow=6, ncol=6)
rownames(breedmat) <- c('Holstein', 'Guernsey', 'Jersey', 'Brown Swiss', 'Angus', 'Hereford')
colnames(breedmat) <- rownames(breedmat)

breedmat[1, 2] <- 1
breedmat[3, 4] <- 2
breedmat[5, 6] <- 3
breedmat[1, 3] <- 4
breedmat[1, 4] <- 4
breedmat[2, 3] <- 4
breedmat[2, 4] <- 4
breedmat[1, 5] <- 5
breedmat[1, 6] <- 5
breedmat[2, 5] <- 5
breedmat[2, 6] <- 5
breedmat[3, 5] <- 5
breedmat[3, 6] <- 5
breedmat[4, 5] <- 5
breedmat[4, 6] <- 5

breedmat <- breedmat + t(breedmat)
breedmat
##             Holstein Guernsey Jersey Brown Swiss Angus Hereford
## Holstein           0        1      4           4     5        5
## Guernsey           1        0      4           4     5        5
## Jersey             4        4      0           2     5        5
## Brown Swiss        4        4      2           0     5        5
## Angus              5        5      5           5     0        3
## Hereford           5        5      5           5     3        0
hout <- hclust(as.dist(breedmat), method = 'single')
plot(hout)

jpeg('Viz/Figure4.jpeg', res = 300, 
     units = 'in', width = 10, height = 8)
plot(hout, main = 'Dendrogram of Breeds',
     xlab ='',
     sub = '')
dev.off()
## quartz_off_screen 
##                 2

Heirarchical clustering algorithms seeks to mimick this agglomerative clustering process that we can implement intuitively using mathematical constructs. To demonstrate how this might work, lets return to our farmer with overstocked dairy cows. As before, our larger mature cows will monopolize the freestalls when they are not on pasture, but this time we will increase the complexity of this example by supposing that animals will always have access to pasture, but that rain events dispersed seemingly randomly throughout the observation window will be the environmental factor driving inversions in lying times between heifers and cows. To make this even more difficult, lets also suppose that this farmer doesn’t have any weather station data handy either.

The first step in heirarchically clustering this data will be to compute a dissimilarity matrix, which is a square symmetric matrix with zeros along the diagonal (you can’t be different from yourself) and at all other matrix indices the pairwise dissimilarity between the observational units in the corresponding row and column indices. In order to cluster cows together with similar lying patterns, we will first calculate a dissimilarity matrix between cows using the Euclidean distance or L2 norm, which is just the squared differences in observed lying times summed over all observation days. Similarly, in order to cluster days together that wherein the herd demonstrated similar lying patterns, we’ll calculate the Euclidean distance between each pair of observation days as the sum of squared differences in lying time over all animals in the herd. Using these pairwise dissimilarity values, a ground-up agglomeration algorithm can then be applied to exhaustively group all observations through successive pairings. Here we will use Ward’s (D2) method. This stratedgy, when used in conjunction with eulidean distance estimates, pays homoage to traditional analysis of variance techniques by choosing at each pairing to combine clusters that produce the largest increase in between-group variance.

In order to visualize the results of both clustering routines simultaneously, we’ll here have to abandon more conventional scatter plot representations this data and instead directly visualize this simulated data matrix using heatmaps. Here each row will correspond to a cow, each column will correspond to an observation day, and each cell will be colored to represent the proportion of time that a given cow spends lying down a given day. In Figure 5A, with rain events scattered randomly throughout the observation window, and cow records provided in no particular order, the clear inversions in lying patterns that we know exist in this data set are completely obscured.

The dendrograms produced by clustering cows over days and also days over all cows are provided. Given that the length of the branches here represent the dissimilarity between between clusters, we can see clearly that in both dendrograms the clustering algorithms have picked up two distinct groups of cow and two distinct groups of observation days. We can now use these dendrograms to re-order the rows and columns indices of our data matrix so the cows with similar temporal patterns in their lying times are grouped together on the row axis, and days with similar herd lying dynamics are placed next to each other on the row axis. Visualizing these results again using a heatmap in Figure 5B, the inversion in lying patterns between cow and heifer subgroups are now quite visually striking. Not only have we succeeded in recovering the social dynamics within this data set using through hierarchical clustering, we’ve done so without ever having to tell this algorithm what factors are driving this behavioral dynamic. Thus, even though we did not have precipitation records, and even if we had not known the parity of our animals, we still would be able identify that overstocking is compromising the welfare of this herd.

# randomize order that data is recorded

set.seed(61916)
simdatday2 <- simdatday[,sample(1:ncol(simdatday))]
colnames(simdatday2) <- colnames(simdatday)
simdatday2 <- simdatday2[sample(1:nrow(simdatday2)),]


pheatmap::pheatmap(simdatday2,
                   cluster_cols = F,
                   cluster_rows = F,
                   #filename = 'Viz/Figure5A.jpeg',
                   width = 7,
                   height = 10,
                   fontsize = 6,
                   main = 'Heatmap Visualization of Raw Data')

temp <- pheatmap::pheatmap(simdatday2,
                   cluster_cols = F,
                   cluster_rows = F,
                   filename = 'Viz/Figure5A.jpeg',
                   width = 7,
                   height = 10,
                   fontsize = 6,
                   main = 'Heatmap Visualization of Raw Data',
                   silent = T)

plot(temp$gtable)
# cluster rows

hout_row <- hclust(dist(simdatday2), method = 'ward.D2' )
plot(hout_row, 
     main = 'Clustering Dendrogram: Cows',
     xlab = 'Between-Cow Euclidean Distance',
     sub='')

# jpeg('Viz/Figure5B.jpeg', res = 300, 
#      units = 'in', width = 16, height = 8)
# plot(hout_row, 
#      main = 'Clustering Dendrogram: Cows',
#      xlab = 'Between-Cow Euclidean Distance',
#      sub='')
# dev.off()


# cluster cols

hout_col <- hclust(dist(t(simdatday2)), method = 'ward.D2' )
plot(hout_col, 
     main = 'Clustering Dendrogram: Days',
     xlab = 'Between-Day Euclidean Distance')

# Visualize Joint Clustering


pheatmap::pheatmap(simdatday2,
                   cluster_cols = hout_col,
                   cluster_rows = hout_row,
                   filename = 'Viz/Figure5B.jpeg',
                   width = 7,
                   height = 10,
                   fontsize = 6,
                   main = 'Heatmap Visualization of Clustered Data')

temp <- pheatmap::pheatmap(simdatday2,
                   cluster_cols = hout_col,
                   cluster_rows = hout_row,
                   #filename = 'Viz/Figure5B.jpeg',
                   width = 7,
                   height = 10,
                   fontsize = 6,
                   main = 'Heatmap Visualization of Clustered Data',
                   silent = T)

plot(temp$gtable)

A Simulation-Based Case Study: Model-Dependent Vs Model-Free Bivariate Associations

The preceding example has hopefully illustrated that, by leveraging intrinsic co-dependencies in the behavioral responses of socially housed animals, model-free machine learning approaches can recover complex behavioral patterns from sensor data absent any assumptions of the causative factors. In practice, however, once such a pattern is detected, we typically want to try to pin down the variables eliciting such reactions. In the preceding simulation, the mechanism linking lying time, age, and weather were pretty simple, and so armed with the insights from the UML analyses, a number of fairly straightforward linear methods might be employed to probe for the causative variables amongst farm records. But what if the link between environmental variables and the behavioral responses had been more complex? We’ve already seen in the previous example that model-based methods can overlook significant bivariate associations when the structure of the assumed model does not match the dynamics of the system, so let’s then consider how model-free approaches might be extended to characterizing complex behavioral patterns not only within but also across data sets.

For applications where intuition is more important than prediction, information-theoretic approaches may provide a model-free approach to both identifying and characterizing bivariate associations between datasets that naturally compliments UML clustering techniques. Information entropy, as calculated using information entropy, is an estimator designed to quantify the relative uncertainty in a categorical response variable (Shannon 1948; MacKay 2003). The discrete values that may be recorded for such a variable is often called the “vocabulary” or “dictionary”, and are here indexed as k = 1….K, wherein K is the total number of discrete levels. For applications in animal behavior, however, this dictionary is typically just the ethogram used. While logarithmic terms are not typically known for their interpretability, they here confer entropy estimates a number of nice algebraic properties that makes this estimator intuitive as a relative measure.

Suppose a dataset were collected wherein each observational unit is coded with one of two mutually exclusive behaviors – for example lying or not lying. If both behaviors are observed at equivalent frequencies, and we are provided no additional information than the distribution of this discrete variable, could we make any type of informed guess about what behavior we might see if we selected an observation at random from this data? Since no behavior is no more likely to be observed than another, we would be completely uncertain, and subsequently would compute a maximum theoretical entropy of log2(2) = 1. Alternatively, if some behavioral mechanism caused this probability to shift towards or away from one of these two behaviors, then we might be able to venture a guess at what behavior we might expect to see if we picked an observation at random – our uncertainty would decrease and so too would the corresponding entropy estimate. Taken to the extreme, if only one behavior were ever observed, then we could guess the encoded value of any observation in our data set with complete certainty, and so the corresponding entropy estimate would drop to zero. Figure 6 demonstrates how the entropy estimate of bivariate variable changes with the symmetry of the distribution of the observed encoding. In calculating such entropy values, the resulting estimates are contingent only upon the assumptions used to develop the discretization scheme. No conditional mean or any other model need be assumed, as with a standard variance estimator utilized in conventional statistical inference frameworks for linear models

# entropy plot

pdist <- data.frame(a = seq(0,0.5, by = 0.001))
pdist$b <- 1 - pdist$a
pdist$H <- apply(pdist, 1, function(x) calcEntropy(x, isfreq = T)$Entropy )

entplot <- ggplot(pdist, aes(x = a, y = H)) + 
  geom_point(col = 'red') +
  xlab('P(X = A)') + 
  ylab('Entropy (H)')


# his max

histmax <- qplot( c(rep('A', 50), rep('B', 50)), 
                  ylab = 'Proportion of Observations',
                  xlab = 'Observed Behavior',
                  ylim = c(0,100))

# hist mix

histmix <- qplot( c(rep('A', 75), rep('B', 25)), 
                  ylab = 'Proportion of Observations',
                  xlab = 'Observed Behavior',
                  ylim = c(0,100))

# his min

histmin <- ggplot(data.frame(d = factor(c(rep('A', 100), rep('B', 0)), 
                        levels = c('A', 'B'))), 
                  aes(d)) + 
                  geom_histogram(stat = 'count') + 
                  ylab('Proportion of Observations') +
                  xlab('Observed Behavior') + 
                  scale_fill_discrete(drop=FALSE) +
                  scale_x_discrete(drop=FALSE)
## Warning: Ignoring unknown parameters: binwidth, bins, pad
                  ylim(c(0,100))
## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --  100
# create combo hist

combohist <- ggpubr::ggarrange(
  histmin,
  histmix,
  histmax,
  ncol = 3, nrow = 1)


# create combo plot

comboplot <- ggpubr::ggarrange(
  combohist ,
  entplot,
  ncol = 1, nrow = 2)

comboplot

ggpubr::ggexport(comboplot,
                         width = 12,
                         height = 8,
                         res = 300,
                         filename = 'Viz/Figure6.pdf')
## file saved to Viz/Figure6.pdf

This model-free framework can subsequently be extended to a multivariate estimator with two or more discrete variables. In the bivariate case, the distribution of one variable is compared across each encoded level of the other in order to decompose the total entropy in the joint encoding into three terms: the conditional entropy unique to the first variable, the conditional entropy unique to the second variable, and the mutual information that is redundant between the two encodings (MacKay 2003). This mutual information estimate in turn reflects how much information we learn about one encoded variable if we know the value of the other, and subsequently cam be used to reflect the strength of a bivariate association between two sets of encoded data regardless of the underlying dynamic – linear, quadratic, exponential, etc. To demonstrate the efficacy of an information-theoretic framework for identifying complex bivariate associations over more conventional model-based approaches, lets return to the previous example. Suppose our farmer with the overstocked cows, now fully aware of the welfare issues this management choice has created, reduces their stocking rate to a 1:1 ratio and continues to monitor the lying time of the animals to provide proof to their milk buyer that the issue has been resolved. After several months at this lower stocking rate, they review the data and are dismayed to find that there are still days where animals have inadequate lying times. To solve this new problem, they hire yet another consultant to help them track down the source of the problem. Suppose that this new data set were collected in the summer, and so naturally, this consultant includes Temperature-Humidity Index (THI) as one of many candidate variables to consider as a potential source of the continued welfare concerns for this herd (Tucker et al. 2021). Using stochastic sampling techniques, the full details of which are provided in Supplemental Materials, we have simulated a fairly straight forward but nonlinear dynamic between these two variables. On days where the THI is low, animals are not heat stressed and so spend the majority of their day out on pasture grazing. As the THI rises, cows become heat stressed for progressively larger proportions of the day, resulting in a gradual increase in the proportion of each day that cows spend lying down in the shade of their freestall barn. Above a certain high THI threshold, however, cows struggle to thermoregulate while lying down, causing them to stand for extended periods of time. When lying time is plotted against THI, as in Figure 7A, we can see that there is a clear nonrandom pattern in this data that is perhaps best characterized by a threshold model – a dynamic that is commonly found when a single behavioral response is subject to the influence of competing underlying behavioral response mechanisms. If a simple linear effect were utilized to probe for a significant bivariate association between these two variables, however, a near-zero slope would be returned, as shown by the red line overlaid in Figure 7A. In this case, not only would a Pearson correlation test (R = -0.03, p = 0.25) fail to identify this nonrandom but also nonlinear pattern, but because this pattern is also not monotonically increasing, even a nonparametric Spearman Rank correlation test (Rho = 0.03, p = 0.12) would fail to identify THI as a significant influence on lying times within this herd.

set.seed(61916)

simthi <- data.frame(THI = rep(seq(75, 95, by = 1), each = 100),
                     LyingTime = rep(seq(40,80, by = 2), each = 100))
simthi$LyingTime[simthi$LyingTime >= 74] <- 30
simthi$LyingTime <- round(simthi$LyingTime + rnorm(nrow(simthi), mean = 0, sd = 5))


testcor <- cor.test(simthi$THI, simthi$LyingTime, method = "pearson")
testcor
## 
##  Pearson's product-moment correlation
## 
## data:  simthi$THI and simthi$LyingTime
## t = -1.1474, df = 2098, p-value = 0.2514
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.06774369  0.01775153
## sample estimates:
##         cor 
## -0.02504187
testspear <- cor.test(simthi$THI, simthi$LyingTime, method = "spearman")
## Warning in cor.test.default(simthi$THI, simthi$LyingTime, method = "spearman"):
## Cannot compute exact p-value with ties
testspear
## 
##  Spearman's rank correlation rho
## 
## data:  simthi$THI and simthi$LyingTime
## S = 1491800000, p-value = 0.1247
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.03351306
texttemp <- paste('Cor: ', round(testcor$estimate, 2), 
                         '(p=', round(testcor$p.value, 2), ')',
                         '\nRho: ', round(testspear$estimate, 2),
                         '(p=', round(testspear$p.value, 2), ')',
                         sep = '')
anno <- data.frame(x = 76, y = 82,
                   text = texttemp)

fig7a <- ggplot(simthi, aes(x = THI, y = LyingTime)) + 
  geom_jitter() + 
  geom_smooth(method='lm', formula= y~x, col = 'red', se = F) + 
  ylab('Proportion of Time Lying') + 
  geom_text(data = anno,
            aes(x = x, y = y, label = text)
            )
fig7a

ggsave('Viz/Figure7A.jpeg',
       plot = fig7a,
       width = 8,
       height = 5,
       units = 'in',
       dpi = 300)

Let’s consider instead a nonparametric testing framework using information entropy as the test statistic (McVey et al. 2021). Suppose both the THI and lying time measurements are discretized using simple equal-sized binning rules. If we compare the mutual information estimate from the two observed encodings against estimates generated from a simple permutation, the resulting p-value for this tests of bivariate association would be highly significant (p <= 0.001). To further characterize this dynamic, a simple contingency table, wherein each cell represents the total number of observations for each joint encoding, can be easily visualized as in Figure 7B. The mutual information estimate for the overall bivariate association can subsequently be decomposed into pointwise mutual information estimates to reflect how much each cell in the observed table differs from the counts that would be expected by multiplying the marginal probabilities, which would be the distribution of joint observations anticipated if no association existed between the two encodings (see McVey et al. 2021 for further details). Here blue cells indicate that there are fewer observations with the corresponding joint encoding than would be expected if no association between these variables were present, whereas orange cells are over-represented relative to the null. From this visualization we can clearly see that the probability of observing a given lying pattern is being shifted in different directions based on the level of the THI encoding. Thus, absent any prior intuition or assumptions about the relationship between these two variables, an information theoretic approach has successfully identified a significant bivariate associations and provided insights into the underlying dynamic to inform further interpretation of the underlying behavioral mechanisms at play and subsequently the correct management interventions needed to remediate this welfare concern.

set.seed(61916)

simthi$THI_Cat <- cut(simthi$THI, 
                      breaks = seq(75, 95, by = 5),
                      include.lowest = T)
simthi$LyingTime_Cat <- cut(simthi$LyingTime, 
                      breaks = seq(15, 85, by = 10),
                      include.lowest = T)
simthi$LyingTime_Cat <- factor(simthi$LyingTime_Cat,
                               levels=rev(levels(simthi$LyingTime_Cat)))

miout <- MITest(simthi[,'THI_Cat', drop=F],
                simthi[,'LyingTime_Cat', drop=F])
miout$pval
## [1] 0
compareEncodings(simthi[,'THI_Cat', drop=F],
                 simthi[,'LyingTime_Cat', drop=F],
                 colorby = 'PMI',
                 simPMI = 'NullPerm', 
                 alpha = 0.05,
                 nsim = 2000,
                 gradient_color_low = 'steelblue',
                 gradient_color_high = 'orange',
                 filename = 'Viz/Figure7B',
                 fontsize = 30,
                 imheight = 35,
                 imwidth = 40)
## file saved to Viz/Figure7B.pdf

An Empirical Case Study: Pattern detection with the LIT Toolkit

While the previous simulation study has hopefully helped to highlight the potential benefits of embracing of a more open-ended model-free approach to knowledge discovery in a large and poorly characterized data set, the behavioral pattern recovered through heirarchical clustering in this synthetic data set is ultimately far simpler than the complex web of behavioral responses that might be elicited from the environmental and social dynamics encountered in a working farm environment. To demonstrate that these concepts can stand up to the scrutiny of flesh and blood bovines, we’ll now attempt to demonstrate the utility of this analytical approach by working through the analysis of some real data.

The Organilac dataset was collected in 2017 from a feed trial assessing the impact of an organic fat supplement on cow health and productivity during the first 150 days of lactation. The study ran from January through July on a USDA Certified Organic dairy in Northern Colorado, and all animal handling and experimental protocols were approved by the Colorado State University Institution of Animal Care and Use Committee (Protocol ID: 16-6704AA). Full details on the experimental design can be found in Manriquez et al. (2018) and Manriquez et al. (2019). A total of 200 cows were over a 1.5-month period into a mixed-parity herd of animals with predominantly Holstein genetics. Cows were maintained in a closed herd for the duration of the trial in an open-sided free-stall barn, which was stocked at roughly half capacity with respect to both feed bunk spaces and stalls. Cows had free access to an adjacent outdoor dry lot while in their home pen, and beginning in April were moved onto pasture at night, to comply with organic grazing standards. Cows were milked three times a day, with free access to TMR between milkings. Health and fertility outcomes were closely monitored. Cows were fitted with a CowManager ear tag accelerator (Agis Automatisering BV, Harmelen, The Netherlands), which provided hourly estimates for the total time that cows were recorded as engaging in 5 mutually exclusive behaviors - eating, rumination, non-activity, activity, and high activity. Cows were also milked in an RFID-equipped rotary parlor (DeLaval, Tumba, Sweden), which not only quantified daily milk production levels, but also recorded milking order, or the or the sequence in which cows enter the parlor to be milked.

Milking order has been the focus of dozens of papers since 1960’s. Nearly all these studies have confirmed that these queuing patterns are nonrandom and surprisingly consistent over time. Subsequently, researchers have speculated that factors such as dominance, health, energy balance and productivity may all influence milking order, but the empirical evidence has been highly inconsistent and difficult to replicate. This assorted history is likely attributable to the fact that this system violates just about every assumption required of a linear model - stationary, homogeneity, independence, etc - making it an excellent system against which to test the efficacy of model free approaches. In previously published work, we demonstrated the efficacy of HC in characterizing queing patterns over conventional model-based and other UML techniques. In subsequent work we have built significantly upon this preliminary work to develop a more complete analytical pipeline that is fully implemented with open source code within the R programming environment. In this paper, we will return to this milking order data set to demonstrate how these purpose-built algorithms can be implemented in analyses of PLF data sets to reveal additional details about this data set.

There is a motto in data science that there is “no free lunch” - that no one algorithm can be optimally applied across all data sets. While a basic implementation of hierarchical clustering was sufficient in our previous simulation study to extract a fairly straight forward pattern, statisticians have concocted a range of modifications to this basic implementation to provide more nuanced performance - namely a range of linkage (agglomeration) rules and a wide array of pairwise dissimilarity estimators that bounded only by the range of ones imagination. So much flexibility is at first overwhelming, but provides two major advantages. The first is that HC algorithms can easily be modified to accomodate the staggaring array of data formats (continuous, discrete, multinomal, etc) returned by various precision livestock farming technologies. The second, and perhaps most important, is there is room to enfuse into these algorithms more fundamental principals and assumptions about animals behavior and biology in order to emphasize particilar patterns in a data set without having to state a comprehensive model.

The LIT package has be purpose-built to facilitate model free analysis fo large complex livestock sensor datasets (McVey et al 2021). To see how this algorithmic pipeline can be leveraged to recover complex nonlinear patterns from PLF datastreams, lets walk through the analysis of the relationship age, yield, and parlor queuing patterns.

First lets load up the queuing pattern results from McVey et al 2020

load('../Data/EntryOrderEncodings.RData')

Alright, and now lets load up the cow attributed data

load('../Data/CowAttributes.RData')

And now the fertility data

load('../Data/FertilityData.RData')

Analysis of Age x Yield

First, we’ll create a joint encoding for age and productivity. In an organic herd, it can be difficult to get some cows bred without the help of hormonal therapies, resulting in extended lactation windows. This means that the parity number and the age of animals are not always directly comparable, as with most conventional herds, so here we’ll use age in days to represent the relative maturity of our cows. To represent the productivity of these animals I’ve extracted the 95th quantile of daily millk yield for each cow, which here will serve as a model-free estimate of peak yield.

# read in data

datmat <- scale(cowattribdat[ , c("AgeDaysOld", "MilkYield95")])
rownames(datmat) <- cowattribdat$CowID # set row names to cow ID
datmat <- datmat[complete.cases(datmat),] # center and scale each variable

# annotate rows with lactation number

tempdat <- fertdat[ ,"LactN", drop=F]
names(tempdat) <- 'Lactation'
tempcol <- list(
  Lactation  = brewer.pal(max(tempdat$Lactation, na.rm = T),
                          "Purples")) 

# create joint encoding

encoding_agemilk <-  encodePlot(datmat, 
                                 dist(datmat),
                                 dist(t(datmat)),
                                 n_rowclusters = 1:10, 
                                 n_colclusters = 0,
                                 plot_title = 'Joint Encoding of Cow Age & Peak Yield',
                                 showplot = F, 
                                 exportplot = T,
                                 imwidth = 12,
                                 imheight = 18,
                                 imres = 300,
                                 filename = "AgeYield/Encoding/AgeYieldEncoding",
                                 exportdendrogram = F
                                 ,border_color = 'grey'
                                 ,annotation_row = tempdat
                                 ,annotation_colors  = tempcol
                                 ) 


plot(encoding_agemilk$nrow_10_ncol_0$plot)

As anticipated, there is quite a bit of redundancy between age and yield. Where this correlation might destabilize a linear model, this redundancy has been leveraged by the HC algorithm implemented for us in the encodePlot utility. Closer inspection reveals that the first three cuts of the dendrogram distinguish between parity 1 animals, parity 2 animals (with a few parity 3 animals mixed in), and parity 3+ animals. All subsequent cuts serve either to distinguish between yield levels within these age ranges, or to further distinguish age groups amongst the older animals in this herd. The encoded values with discretization granularities from 2:10 are automatically created and stored for us within the encodePlot output, which is a structured list that is specially formated to pass directly to downstream LIT algorithms

Bivar Analysis - Healthy Cows

And now lets see if there is a significant association between this age-yield encoding and our milking order encoding. First we’ll look at encodings created with only healthy cows (to remove any counfoudning behavioral dynamics related to disease states).

set.seed(61916)

bvt_agemlk_dmh <- bivarTreeTest(
                    encoding_agemilk,
                    dmgridout_healthy,
                    grid_x = 2:10, 
                    grid_y = 2:10,
                    estimator = 'shrink',
                    rescaleMI = T,
                    showplot = T) 

ggsave('AgeYield/Bivar/MetaParamViz_DMHealthy.jpeg',
       plot = last_plot(),
       width = 8,
       height = 5,
       units = 'in',
       dpi = 300)

bvt_agemlk_dmh$PVals
## , , XClust_ncol_0, YClust_ncol_1
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.265 0.599 0.499 0.658 0.763 0.826 0.656 0.813 0.391
## Kx=3  0.082 1.000 0.473 0.658 1.000 0.843 0.522 0.360 0.212
## Kx=4  0.132 0.135 0.395 0.544 0.414 0.404 0.220 0.160 0.124
## Kx=5  0.178 0.319 0.501 0.752 0.693 0.626 0.398 0.347 0.224
## Kx=6  0.147 0.468 0.387 0.625 0.811 0.732 0.576 0.509 0.374
## Kx=7  0.266 0.531 0.701 0.747 0.825 0.817 0.623 0.529 0.491
## Kx=8  0.359 0.665 0.800 0.818 1.000 0.852 0.825 0.772 0.703
## Kx=9  0.289 0.580 0.697 0.748 0.803 0.791 0.769 0.700 0.624
## Kx=10 0.327 0.710 0.794 0.802 0.892 0.824 0.759 0.699 0.666
## 
## , , XClust_ncol_0, YClust_ncol_2
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.415 0.888 0.909 0.991 0.794 0.058 0.404 0.424 0.668
## Kx=3  0.361 0.413 0.700 1.000 1.000 0.155 0.354 0.427 0.390
## Kx=4  0.132 0.268 0.397 0.519 0.391 0.036 0.162 0.220 0.215
## Kx=5  0.212 0.403 0.435 0.751 0.650 0.207 0.354 0.497 0.528
## Kx=6  0.202 0.375 0.431 0.730 0.736 0.267 0.530 0.721 0.805
## Kx=7  0.302 0.422 0.691 0.793 0.799 0.406 0.665 0.553 0.557
## Kx=8  0.454 0.615 0.835 0.903 1.000 0.437 0.762 0.649 0.593
## Kx=9  0.361 0.579 0.767 0.849 0.750 0.369 0.705 0.577 0.529
## Kx=10 0.455 0.635 0.881 0.886 0.796 0.548 0.779 0.564 0.608
## 
## , , XClust_ncol_0, YClust_ncol_3
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.827 0.695 0.566 0.592 0.192 0.051 0.791 0.642 0.402
## Kx=3  0.301 0.316 0.734 0.769 0.418 0.122 0.653 0.693 0.259
## Kx=4  0.248 0.306 0.290 0.372 0.213 0.045 0.450 0.410 0.111
## Kx=5  0.547 0.270 0.399 0.627 0.395 0.162 0.573 0.694 0.268
## Kx=6  0.687 0.226 0.443 0.682 0.462 0.301 0.805 0.854 0.530
## Kx=7  0.303 0.510 0.744 0.637 0.779 0.450 0.888 0.880 0.389
## Kx=8  0.375 0.721 0.866 0.776 0.580 0.512 0.899 0.866 0.370
## Kx=9  0.384 0.639 0.804 0.689 0.515 0.433 0.844 0.815 0.311
## Kx=10 0.427 0.607 0.912 0.830 0.538 0.489 0.923 0.809 0.391
## 
## , , XClust_ncol_0, YClust_ncol_4
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.827 0.889 0.835 0.948 0.525 0.846 0.248 0.343 0.677
## Kx=3  0.301 0.323 1.000 1.000 0.732 0.503 0.243 0.263 0.422
## Kx=4  0.248 0.144 0.336 0.480 0.329 0.300 0.070 0.099 0.393
## Kx=5  0.547 0.241 0.347 0.687 0.596 0.413 0.170 0.163 0.506
## Kx=6  0.687 0.090 0.299 0.762 0.584 0.648 0.287 0.417 0.805
## Kx=7  0.303 0.130 0.573 0.766 0.722 0.455 0.391 0.568 0.836
## Kx=8  0.375 0.220 0.707 0.891 0.732 0.569 0.644 0.716 0.902
## Kx=9  0.384 0.194 0.612 0.827 0.653 0.515 0.577 0.644 0.871
## Kx=10 0.427 0.254 0.758 0.822 0.737 0.653 0.496 0.725 0.916
## 
## , , XClust_ncol_0, YClust_ncol_5
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.827 0.815 0.544 0.749 0.420 0.119 0.772 0.604 0.038
## Kx=3  0.301 0.423 0.399 0.660 0.383 0.220 0.514 0.321 0.058
## Kx=4  0.248 0.275 0.510 0.238 0.233 0.047 0.318 0.110 0.018
## Kx=5  0.547 0.358 0.697 0.445 0.312 0.119 0.547 0.133 0.075
## Kx=6  0.687 0.192 0.537 0.585 0.407 0.248 0.768 0.230 0.183
## Kx=7  0.303 0.247 0.560 0.603 0.504 0.241 0.869 0.225 0.212
## Kx=8  0.375 0.358 0.649 0.747 0.685 0.454 0.843 0.559 0.447
## Kx=9  0.384 0.306 0.599 0.678 0.618 0.394 0.789 0.484 0.381
## Kx=10 0.427 0.372 0.762 0.864 0.681 0.412 0.929 0.594 0.459
## 
## , , XClust_ncol_0, YClust_ncol_6
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.250 0.607 0.525 0.952 0.818 0.348 0.186 0.961 0.860
## Kx=3  0.640 0.288 0.392 0.766 1.000 0.493 0.224 0.431 1.000
## Kx=4  0.230 0.113 0.170 0.236 0.443 0.254 0.071 0.226 0.362
## Kx=5  0.381 0.370 0.285 0.440 0.615 0.427 0.110 0.343 0.566
## Kx=6  0.348 0.401 0.321 0.487 0.698 0.693 0.254 0.529 0.825
## Kx=7  0.615 0.433 0.507 0.481 0.829 0.844 0.419 0.442 0.784
## Kx=8  0.634 0.509 0.691 0.497 1.000 0.871 0.518 0.706 0.906
## Kx=9  0.631 0.444 0.608 0.414 0.824 0.825 0.459 0.645 0.849
## Kx=10 0.756 0.566 0.717 0.603 0.930 0.891 0.435 0.824 0.922
compareEncodings(encoding_agemilk$nrow_4_ncol_0,
                 dmgridout_healthy$nrow_7_ncol_5,
                 colorby = 'PMI',
                 simPMI = 'NullParam',
                 alpha = 0.05,
                 nsim = 2000,
                 gradient_color_low = 'steelblue',
                 gradient_color_high = 'orange',
                 filename = 'AgeYield/Bivar/AgeYield_DMHealthy',
                 fontsize = 30,
                 imheight = 38,
                 imwidth = 30)
## file saved to AgeYield/Bivar/AgeYield_DMHealthy.pdf

The results of the bivar tree test are visually summarized via a heatmap. From this we can clearly see that a significant bivariate association between comes into resolution with four clusters for the age-yield encoding (kx) and seven clusters for the queueing position encoding (ky).

In creating the milking order encodings, the number of time clusters was varried on a grid from 1:6 (see McVey et al 2020 for details). The number of time clusters used to re-weight the between-cow dissimilarity metric will subtly influence the encoding. That means that, for each combination of encoding granularities (each cell in this metaparameter grid), there are actually six different milk order encodings created with six different time parameters. To account for this, the bivar tree test has tested all six of these encodings, and reported test results for the time parameterization with the strongest association between milking order and age-yield encodings. The purple annotations along the column axis provides a visual summary of which time parameters provided the best bivariate association for each combination of metaparameter values. Thus, we can not only optimize the granularity at which the age-yield and milk order encodings occur, but also explore the temporal complexity as well. From this we see that the strongest associations occur with two temporal subperiods for milking order, which corresponds to the separation between the pen and pasture subperiods.

With these metaparameters, we can use the compareEncodings utility to characterize this bivariate association. This will create a contingency table with counts for the joint encoding across the two data sets. This visuzalization is here paramterized so that cells will be colored by their pointwise mutual information value if the observed count varies signiticantly from the count that would be expected if no association existed between these two datastreams, as determined by a semi-parametric permutation-based multinomial test using the product of the marginal frequencies. The resulting visualization, which is availiable in supplemental materials, reveals that the oldest cows in this herd are significantly over-represented among cows that consistently enter at the very front of the queue

Bivar Analysis - All Cows

Alright, and now lets see if there is a significant association between this age-yield encoding and our milking order encoding if we consider all cows with complete records (healthy animals and those with recorded health events).

set.seed(61916)

bvt_agemlk_dma <- bivarTreeTest(encoding_agemilk,
                    dmgridout_all,
                    grid_x = 2:10, 
                    grid_y = 2:10,
                    estimator = 'shrink',
                    rescaleMI = T,
                    showplot = T) 

ggsave('AgeYield/Bivar/MetaParamViz_DMAll.jpeg',
       plot = last_plot(),
       width = 8,
       height = 5,
       units = 'in',
       dpi = 300)

bvt_agemlk_dma$PVals
## , , XClust_ncol_0, YClust_ncol_1
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.692 0.631 0.560 0.526 0.125 0.301 0.365 0.345 0.382
## Kx=3  0.078 0.221 0.179 0.298 0.136 0.341 0.315 0.324 0.361
## Kx=4  0.302 0.300 0.494 0.468 0.230 0.313 0.215 0.283 0.407
## Kx=5  0.287 0.431 0.362 0.631 0.465 0.406 0.315 0.443 0.575
## Kx=6  0.230 0.466 0.301 0.561 0.465 0.476 0.366 0.556 0.646
## Kx=7  0.369 0.348 0.273 0.437 0.524 0.482 0.335 0.474 0.637
## Kx=8  0.460 0.552 0.507 0.640 0.626 0.751 0.695 0.729 0.691
## Kx=9  0.499 0.626 0.605 0.789 0.649 0.776 0.728 0.780 0.725
## Kx=10 0.401 0.502 0.554 0.720 0.600 0.678 0.677 0.706 0.716
## 
## , , XClust_ncol_0, YClust_ncol_2
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.572 0.722 0.965 0.470 0.050 0.213 0.129 0.233 0.386
## Kx=3  0.464 0.494 0.217 0.365 0.001 0.019 0.192 0.202 0.317
## Kx=4  0.593 0.524 0.447 0.400 0.003 0.016 0.220 0.285 0.348
## Kx=5  0.649 0.688 0.232 0.593 0.005 0.051 0.380 0.412 0.574
## Kx=6  0.474 0.658 0.160 0.570 0.008 0.146 0.497 0.641 0.765
## Kx=7  0.605 0.637 0.149 0.547 0.017 0.121 0.512 0.775 0.911
## Kx=8  0.826 0.855 0.366 0.803 0.027 0.268 0.692 0.696 0.941
## Kx=9  0.794 0.898 0.423 0.908 0.042 0.434 0.752 0.748 0.872
## Kx=10 0.661 0.824 0.402 0.881 0.048 0.432 0.691 0.711 0.890
## 
## , , XClust_ncol_0, YClust_ncol_3
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.755 0.854 0.942 0.915 0.228 0.383 0.061 0.183 0.194
## Kx=3  0.448 0.413 0.192 0.601 0.188 0.306 0.120 0.264 0.354
## Kx=4  0.665 0.352 0.192 0.648 0.148 0.388 0.108 0.139 0.340
## Kx=5  0.634 0.586 0.227 0.654 0.206 0.477 0.480 0.378 0.340
## Kx=6  0.460 0.518 0.103 0.622 0.259 0.572 0.634 0.381 0.666
## Kx=7  0.576 0.316 0.067 0.634 0.322 0.831 0.431 0.389 0.765
## Kx=8  0.797 0.543 0.240 0.833 0.648 0.931 0.600 0.549 0.687
## Kx=9  0.777 0.659 0.351 0.931 0.796 0.915 0.643 0.629 0.810
## Kx=10 0.640 0.507 0.378 0.842 0.790 0.906 0.749 0.606 0.775
## 
## , , XClust_ncol_0, YClust_ncol_4
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.255 0.350 0.408 0.641 0.168 0.737 0.318 0.214 0.028
## Kx=3  0.058 0.358 0.393 0.374 0.167 0.328 0.115 0.139 0.052
## Kx=4  0.045 0.272 0.453 0.341 0.177 0.371 0.099 0.222 0.162
## Kx=5  0.196 0.697 0.312 0.408 0.288 0.341 0.088 0.502 0.338
## Kx=6  0.264 0.765 0.243 0.414 0.331 0.555 0.219 0.752 0.495
## Kx=7  0.158 0.825 0.435 0.292 0.518 0.563 0.295 0.850 0.541
## Kx=8  0.191 0.834 0.760 0.581 0.721 0.808 0.461 0.851 0.419
## Kx=9  0.339 0.882 0.858 0.707 0.894 0.863 0.521 0.906 0.682
## Kx=10 0.288 0.725 0.753 0.679 0.843 0.801 0.563 0.915 0.672
## 
## , , XClust_ncol_0, YClust_ncol_5
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.472 0.777 0.898 0.578 0.184 0.550 0.191 0.062 0.043
## Kx=3  0.213 0.221 0.151 0.323 0.146 0.269 0.110 0.025 0.017
## Kx=4  0.289 0.209 0.237 0.263 0.191 0.232 0.204 0.007 0.029
## Kx=5  0.372 0.430 0.345 0.235 0.290 0.388 0.125 0.015 0.062
## Kx=6  0.323 0.419 0.302 0.223 0.333 0.483 0.264 0.035 0.174
## Kx=7  0.411 0.386 0.374 0.189 0.220 0.762 0.484 0.037 0.203
## Kx=8  0.655 0.520 0.655 0.488 0.385 0.835 0.626 0.140 0.367
## Kx=9  0.611 0.666 0.842 0.644 0.558 0.900 0.756 0.197 0.506
## Kx=10 0.415 0.611 0.825 0.677 0.522 0.855 0.711 0.222 0.530
## 
## , , XClust_ncol_0, YClust_ncol_6
## 
##        Ky=2  Ky=3  Ky=4  Ky=5  Ky=6  Ky=7  Ky=8  Ky=9 Ky=10
## Kx=2  0.917 0.907 0.565 0.757 0.049 0.130 0.493 0.034 0.129
## Kx=3  0.069 0.352 0.104 0.335 0.050 0.269 0.211 0.033 0.054
## Kx=4  0.076 0.348 0.165 0.452 0.089 0.245 0.161 0.066 0.113
## Kx=5  0.248 0.714 0.183 0.358 0.140 0.298 0.274 0.069 0.172
## Kx=6  0.340 0.540 0.215 0.437 0.222 0.382 0.601 0.128 0.341
## Kx=7  0.152 0.600 0.294 0.306 0.216 0.305 0.579 0.096 0.405
## Kx=8  0.211 0.833 0.439 0.667 0.245 0.751 0.696 0.191 0.376
## Kx=9  0.387 0.785 0.722 0.630 0.267 0.856 0.697 0.261 0.511
## Kx=10 0.268 0.631 0.664 0.599 0.250 0.789 0.776 0.200 0.469
compareEncodings(encoding_agemilk$nrow_3_ncol_0,
                 dmgridout_all$nrow_6_ncol_2,
                 colorby = 'PMI',
                 simPMI = 'NullParam',
                 alpha = 0.05,
                 nsim = 2000,
                 gradient_color_low = 'steelblue',
                 gradient_color_high = 'orange',
                 filename = 'AgeYield/Bivar/AgeYield_DMAll',
                 fontsize = 30,
                 imheight = 38,
                 imwidth = 30)
## file saved to AgeYield/Bivar/AgeYield_DMAll.pdf

With the sick cows added back in, we now get a highly significant association coming into resoluation with three and six clusters for age-yield and milk order encodings respectively. Using the compareEncoding utility to characterize this association, we see that older cows are still significantly over-represented at the very front of the queue. They are now also significantly represented among Cluster #5, which is characterized by cows that entered nearer the front of the queue when cows entered the parlor from over-nighting in their free stall pen, but nearer the center or rear of the queue when entering overnighting in the pasture. Additionally, heifers were significantly under-represented in this group. That this trend seems to be modulated by pasture access, and is only found when sick cows are added to the analysis, may indicate that these cows might be altering their behavior in response to changes in energy balance or even perhaps mobility related to their pasture access. We also now see that our oldest cows are significantly under-represented among animals that consistently entered at the very rear of the queue, while parity two cows were significantly over-represented. That this trend emerges with addition of cows with recorded health events could indicate that these parity two animals may be struggling, but it is also possible that other behavioral mechanisms not related to health status are simply coming into resolution with the larger sample size.